class: center, middle, inverse, title-slide .title[ # Describing Data Graphically with R ] .author[ ### S. Mason Garrison ] --- layout: true <div class="my-footer"> <span> <a href="https://psychmethods.github.io/coursenotes/" target="_blank">Methods in Psychological Research</a> </span> </div> --- class: middle # Describing Data --- ## Hans Rosling
--- # Summarize .pull-left[ - Transform a pile of numbers into a summary - Descriptive Statistics - Distribution of a variable is a table/graph showing the categories/values of outcomes and their frequency/percentage of occurrence - Exploratory Data Analysis (Tukey, 1977) ] .pull-right[  ] --- # Exploratory Data Analysis .pull-left[ - Tukey (1977) - EDA - Graphical Data Analysis - Numbers as summaries - Emphasized Robust Statistics ] -- .pull-right[  ] --- # Descriptive Statistics - Examples - Tables - Graphs - Summary Statistics --- # Tables - Woodbridge (1845) <img src="data:image/png;base64,#../img/woodbridge1845.png" width="100%" style="display: block; margin: auto;" /> --- # Graphs - Minard (1869) <img src="data:image/png;base64,#../img/minard.png" width="70%" style="display: block; margin: auto;" /> --- # Examples - Summary Statistics - Measures of Central Tendency - Measures of Spread <img src="data:image/png;base64,#../img/centralbears.jpg" width="30%" style="display: block; margin: auto;" /> --- # Categorical Variable Displays (Nominal, Ordinal) - Frequency Distribution Graphs - Bar Chart - Pie Chart -- - Quantitative Variables - Histograms - Stem plots -- - Time Plots --- # Frequency distribution graph .pull-left[ - Bar Chart - Graphs of variables with categories of outcomes on the x axis; and the frequency or percent of each category on the Y axis. ] .pull-right[ <img src="data:image/png;base64,#../img/oresme.jpg" width="95%" style="display: block; margin: auto;" /> ] --- # Bar Graph/Chart .pull-left[ ```r # Bar chart library(car) counts <- table(mtcars$gear) ``` ] .pull-right.small[ ```r barplot(counts, main="Car Distribution", xlab="Number of Gears") ``` <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-8-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Stacked Bar Chart .small.pull-left[ ```r df <- data.frame( group = c("Male", "Female", "Child"), value = c(25, 25, 50)) head(df) ``` ``` ## group value ## 1 Male 25 ## 2 Female 25 ## 3 Child 50 ``` ] .pull-right.small[ ```r library(ggplot2) bp <- ggplot(df, aes(x="", y=value, fill = group))+ geom_bar(width = 1,stat = "identity") bp ``` <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-10-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Pie Chart .pull-left[ - Graphs of variables with categories of outcomes as frequency or percent of each category in the pie. ] .pull-right[ <img src="data:image/png;base64,#../img/pieplay.jpg" width="90%" height="40%" style="display: block; margin: auto;" /> .footnote["A pie chart showing each state in the United States, part of Playfair's translation of A Statistical Account of the United States of America by D. F. Donnant." ]] --- # Pie chart .pull-left-narrow[ ```r slices <- c(10,12, 4,16, 8) lbls <- c("US", "UK", "Australia", "Germany", "France") ``` ] .pull-right-wide[ ```r pie(slices, labels = lbls, main="Pie Chart of Countries") ``` <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-13-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Example 2 .small.pull-left[ ```r mytable <- table(iris$Species) lbls <- paste(names(mytable), "\n", mytable, sep="") pie(mytable, labels = lbls, main="Pie Chart of Species\n (with sample sizes)") ``` <img src="data:image/png;base64,#descriptive_files/figure-html/example2-1.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-14-1.png" width="65%" style="display: block; margin: auto;" /> ] --- # Convert Bar Chart into Pie Chart .small.pull-left[ ```r pie <- bp + coord_polar("y", start=0) pie ``` <img src="data:image/png;base64,#descriptive_files/figure-html/polar-1.png" width="90%" style="display: block; margin: auto;" /> ```r pie + scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9")) ``` <img src="data:image/png;base64,#descriptive_files/figure-html/polar-2.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-15-1.png" width="65%" style="display: block; margin: auto;" /><img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-15-2.png" width="65%" style="display: block; margin: auto;" /> ] [Additional Resources](http://www.sthda.com/english/wiki/ggplot2-pie-chart-quick-start-guide-r-software-and-data-visualization) --- # Quantitative Variables - Interval or Ratio Scales - Histograms - Stem plots - Time plots --- # Histogram - A histogram is a graphical representation of the distribution of numerical data. - Approximates a probability distribution - First described by Pearson in 1895. --- # Histogram .pull-left[ ```r library(MASS) # load library variable<-cats$Bwt hist(variable) ``` <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-16-1.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ ```r #Convert to Imperial variable<-variable*2.2 hist(variable) ``` <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-17-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Stemplot - Sometimes called a stem and leaf diagram ```r # Stem and Leaf plot stem(faithful$eruptions,scale=1) ``` ``` ## ## The decimal point is 1 digit(s) to the left of the | ## ## 16 | 070355555588 ## 18 | 000022233333335577777777888822335777888 ## 20 | 00002223378800035778 ## 22 | 0002335578023578 ## 24 | 00228 ## 26 | 23 ## 28 | 080 ## 30 | 7 ## 32 | 2337 ## 34 | 250077 ## 36 | 0000823577 ## 38 | 2333335582225577 ## 40 | 0000003357788888002233555577778 ## 42 | 03335555778800233333555577778 ## 44 | 02222335557780000000023333357778888 ## 46 | 0000233357700000023578 ## 48 | 00000022335800333 ## 50 | 0370 ``` --- # Time Plots <img src="data:image/png;base64,#../img/minard.png" width="30%" style="display: block; margin: auto;" /> -- - Edward Tufte has said that Minard's plot: > "may well be the best statistical graphic ever drawn" - It packs a ton of information into one dense figure. --- # Time Plots <img src="data:image/png;base64,#../img/minard.png" width="90%" style="display: block; margin: auto;" /> --- - The plot contains six variables, each mapped to a different aesthetic: | Information | Aesthetic | |---------------------------------------|-----------------| | Size of Napoleon's Grande Armée | Width of path | | Longitude of the army's position | x-axis | | Latitude of the army's position | y-axis | | Direction of the army's movement | Color of path | | Date of points along retreat path | Text below plot | | Temperature during the army's retreat | Line below plot | --- # Recreation in R - This plot has been recreated in R by: - [Andrew Heiss](https://www.andrewheiss.com/blog/2017/08/10/exploring-minards-1812-plot-with-ggplot2/) - [Michael Friendly](http://www.datavis.ca/gallery/re-minard.php) - [Hadley Wickham](https://www.tandfonline.com/doi/suppl/10.1198/jcgs.2009.07098?scroll=top) --- # Side by Side .pull-left[ <img src="data:image/png;base64,#../img/minard.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-22-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # More Accessible Resources - [R Graph Catalog](http://shinyapps.stat.ubc.ca/r-graph-catalog/) - [intRo](http://www.intro-stats.com/) - .hand-pink[[Data Science for Psychologists](https://datascience4psych.github.io/DataScience4Psych/)] --- class: center, middle Wrapping Up... --- # R Basics --- # R Basics - R is a powerful language for statistical computing and graphics - It's free, open-source, and has a large community of users and developers --- # R Installation and Setup - Download R from CRAN (Comprehensive R Archive Network): https://cran.r-project.org/ - Choose the version for your operating system - RStudio is a popular IDE for R, available at: https://www.rstudio.com/products/rstudio/download/ --- # R Console - After installation, you can start R or RStudio - You'll see a console where you can enter commands - Try a simple calculation: ```r 2 + 3 ``` ``` ## [1] 5 ``` --- # Variable Assignment - Use `<-` or `=` to assign values to variables Variable names should be descriptive and use `under_scores` or `camelCase` ```r x <- 10 y = 20 total_sum <- x + y totalSum <- x + y print(total_sum) ``` ``` ## [1] 30 ``` --- # Basic Data Types - Numeric: 1.5, 2, 3.14 - Character: "hello", 'world' - Logical: TRUE, FALSE --- # Vectors - Create vectors using `c()` function - Vectors can contain elements of the same type ```r numeric_vector <- c(1, 2, 3, 4, 5) character_vector <- c("a", "b", "c") print(numeric_vector) ``` ``` ## [1] 1 2 3 4 5 ``` --- # Basic Plotting in R .pull-left[ - R has built-in plotting functions - These are useful for quick visualizations ] .pull-right[ ```r x <- 1:10 y <- x^2 plot(x, y, main="Simple Plot", xlab="X axis", ylab="Y axis") ``` <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-27-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Using Functions .pull-left-narrow[ - R has many built-in functions - You can also create your own functions ] .pull-right-wide[ ```r # Built-in function mean(c(1, 2, 3, 4, 5)) ``` ``` ## [1] 3 ``` ```r # Custom function square <- function(x) { return(x^2) } square(4) ``` ``` ## [1] 16 ``` ] --- # Comments - All text after the pound sign "#" within the same line is considered a comment. > 1 + 1 # this is a comment [1] 2 --- # Getting Help - R provides extensive documentation. - Use `?` or `help()` to get information about functions - For example, entering `?c` or `help(c)` at the prompt gives documentation of the function `c` in R. - Example: --- # Getting Help > help(c) If you are not sure about the name of the function you are looking for, you can perform a fuzzy search with the apropos function. > apropos("nova") [1] "anova" "anova.glm" .... --- # Wrapping Up...